To Be or Not to Be a Zero Pronoun: a Machine Learning Approach for Romanian
نویسندگان
چکیده
This paper presents a new study on the distribution and identification of zero pronouns in Romanian. A Romanian corpus that includes legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments have been performed on the created corpus for the identification of verbs which have a zero pronoun in the subject position. The evaluation results highlight that zero pronouns appear frequently in Romanian, and their distribution depends largely on the genre. Additionally, a search scope for the antecedent has been determined, increasing the chances of correct resolution. Furthermore, more than 70% of the zero pronouns have been accurately identified by various machine learning algorithms. The strong similarity between our results and those obtained for other Romance languages support our conclusions.
منابع مشابه
Debt Collection Industry: Machine Learning Approach
Businesses are increasingly interested in how big data, artificial intelligence, machine learning, and predictive analytics can be used to increase revenue, lower costs, and improve their business processes. In this paper, we describe how we have developed a data-driven machine learning method to optimize the collection process for a debt collection agency. Precisely speaking, we create a frame...
متن کاملA Deep Neural Network for Chinese Zero Pronoun Resolution
This paper investigates the problem of Chinese zero pronoun resolution. Most existing approaches are based on machine learning algorithms, using hand-crafted features, which is labor-intensive. Moreover, semantic information that is essential in the resolution of noun phrases has not been addressed enough by previous approaches on zero pronoun resolution. This is because that zero pronouns have...
متن کاملResolving Romanian Zero Pronouns: A Machine Learning Approach
This paper presents a new study on the distribution, identification, and resolution of zero pronouns in Romanian. A Romanian corpus, including legal, encyclopaedic, literary, and news texts has been created and manually annotated for zero pronouns. Using a morphological parser for Romanian and machine learning methods, experiments were performed on the created corpus for the identification and ...
متن کاملIdentification and Resolution of Chinese Zero Pronouns: A Machine Learning Approach
In this paper, we present a machine learning approach to the identification and resolution of Chinese anaphoric zero pronouns. We perform both identification and resolution automatically, with two sets of easily computable features. Experimental results show that our proposed learning approach achieves anaphoric zero pronoun resolution accuracy comparable to a previous state-ofthe-art, heuristi...
متن کاملAn Effective Approach for Robust Metric Learning in the Presence of Label Noise
Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...
متن کامل